Agglomerative Clustering of Bagged Data Using Joint Distributions

نویسندگان

  • David Arbour
  • James Atwood
  • Ahmed El-Kishky
  • David Jensen
چکیده

Current methods for hierarchical clustering of data either operate on features of the data or make limiting model assumptions. We present the hierarchy discovery algorithm (HDA), a model-based hierarchical clustering method based on explicit comparison of joint distributions via Bayesian network learning for predefined groups of data. HDA works on both continuous and discrete data and offers a model-based approach to agglomerative clustering that does not require prespecification of the model dependency structure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering Large Data Sets Described With Discrete Distributions and An Application on TIMSS Data Set

Symbolic Data Analysis is based on a special descriptions of data – symbolic objects. Such descriptions preserve more detailed information about data than the usual representations with mean values. A special kind of symbolic object is also representation with distributions. In the clustering process this representation enables us to consider the variables of all types at the same time. We pres...

متن کامل

Clustering large data sets described with discrete distributions and its application on TIMSS data set

Symbolic Data Analysis is based on a special descriptions of data – symbolic objects. Such descriptions preserve more detailed information about data than the standard representations with mean values. A special kind of symbolic object is also representation with distributions. In the clustering process this representation enables us to consider the variables of all types at the same time. We p...

متن کامل

Hierarchical clustering of word class distributions

We propose an unsupervised approach to POS tagging where first we associate each word type with a probability distribution over word classes using Latent Dirichlet Allocation. Then we create a hierarchical clustering of the word types: we use an agglomerative clustering algorithm where the distance between clusters is defined as the JensenShannon divergence between the probability distributions...

متن کامل

Hierarchical clustering of word class distributions

We propose an unsupervised approach to POS tagging where first we associate each word type with a probability distribution over word classes using Latent Dirichlet Allocation. Then we create a hierarchical clustering of the word types: we use an agglomerative clustering algorithm where the distance between clusters is defined as the JensenShannon divergence between the probability distributions...

متن کامل

Classifying qualitative information using centroid based methods

In this paper we consider the problem of classification of qualitative data. We present an approach to overcome several difficulties which show SAHN (sequential, agglomerative, hierarchic, nonoverlapping) clustering methods to classify qualitative data when using centroids to compute similarities between pairs of classes. The approach is based on a recently proposed class of qualitative weighte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013